
APPENDIX F 



COPYRIGHT 1998, LANGUAGE ANALYSIS SYSTEMS, INC. 



03/06 '98 12=41 I D : LANG . ANALYSIS SYSTEMS FAX : 703-834-6230 PAGE 2 



The Use of Phonological Information in 
Automatic Name Searching 



Richard Lutz, Ph.D. 
AIPA97 

March 25, 1997 



PROPRIETARY 
INFORMATION 




© Language Analysis Systems, Inc. 

2214 Rock Hill Road— Herndon. VA— 20170 



03/06 '98 12=41 ID: LANG. ANALYSIS SYSTEMS FAX: 703-834-6230 



PAGE 3 



AIPA97 Paper Presentation: 
The Use of Phonological Information in Automatic Name Searching 

Richard Lutz, Ph.D. 

t) Language Analysis Systems, Inc. 
at the Center for Innovative Technology 
2214 Rock Hill Road - Herndon, V A 20170 

Application Area: Automated Data Understanding 



Abstract 

This paper describes a two-year research effort to incorporate phonological information 
into automated name searching. Specifically, names represented by standard roman 
characters are automatically converted to multiple phonetic representations, based on sets 
of regular expressions that relate character strings to predictable sounds or sound 
sequences using a widely accepted phonetic notation system, (he International Phonetic 
Alphabet. Names are retrieved when (here is an intersection of the regular expression of 
the query name with regular expressions of names in a preprocessed database. Additional 
similar names can be retrieved based on (he articulatory characteristics of the sound 
segments contained in the query and database names. 

1.0 Introduction 

Variation in the spellings of names is a persistent issue in the area of automated name 
searching in large databases (Hermansen, 1985). In general, the source of spelling 
variation of names can be analyzed and explained a posteriori . Predicting any individual 
spelling, however, remains problematic. Sources for spelling variation include: keyboard- 
based data entry errors (e.g., hitting the wrong key; Genning for Henning), syntactic 
variation (e.g., out-of-sequence given name and surname such as Richard Thomas for 
Thomas Richard), morphological variation (e.g., truncated strings such as Rich or R for 
Richard) and semantically-based variation (e.g., nativizations such as Goldwatex for 
Goldwasser). Of interest in the current paper is variation due to orthographic conventions 
(e.g., English can represent the same sound in more than one way, as in Stephen ~ Steven) 
and articulatory variation (e.g., the p in Thompson is a predictable spelling of Thomson 
based on principles of articulation). While Ihcre are multiple sources of name variation, 
this paper will present evidence 1) that the inherent ambiguity in the English use of roman 
characters can be mitigated by multiple mappings to unambiguous phonetic characters and 
2) that phonologically-similar names can be retrieved through the analysis of sounds into 
their articulatory features (i.e., place and manner of articulation). It is based on research 
conducted from September of 1995 through the present. 



03/06 '98 12:41. ID:LANG: ANALYSIS SYSTEMS ■ FAX : 703-834-6230 PAGE 



2.0 Statement of Problem 

Character-based name searching relies on spelling as the basis for calculating distance 
between the query name and the database name. While spelling using roman characters is 
not unrelated to pronunciation, the relationship between the two is often inconsistent 
(Cummings 1988), and the orthographic information (i.e., conventions of the spelling 
system of a language) is at times misleading. Thus, one spelling may map to multiple 
pronunciations: Lutz can be pronounced to rhyme with puis, cuts or shoots, and at least 
several additional non-English pronunciations are possible. The converse, of course, is 
also the case: there may be a number of ways of representing a single pronunciation: 
Lewis, and Louis, for example, are usually pronounced identically by English speakers. 

Character-matching techniques assume a reliable relationship between the orthographic 
system and the pronunciation. This assumption is flawed because the goodness of fit 
between orthography and pronunciation, especially for English, is many-to-many, that is, a 
given roman character can stand for more than one sound, and an individual sound may be 
represented in more than one way in the spelling system. Thus, the sound [f]' can be 
written as f (Frank), ff (Taffy), ph (Phillip) or even gh (Rough). Conversely, the gh digraph 
may represent the [f] sound of Rough, be silent (Dough), or represent [k] (in some 
pronunciations of McClaughl in), [h] (in Monagham), [g] (in McGhee) or [gh] (across 
syllable breaks, as in Bighouse). 

While much name variation can be traced to non-phonological issues, including syntax 
(order of name segments), aliases (John Doe for John Dillinger), morphological issues 
(Peg for Margaret) or data entry errors, many name variants can be traced to the 
relationship between orthography and pronunciation. Orally transmitted names, for 
instance, are especially prone to guesses on the part of the transcriber as to the ''official" 
(i.e., legal) spelling of an individual's name. Language contact can account for some 
spelling variants as well (French Beauchamp and Anglicized Beecham), as can 
transcription from non-roman character sets (Wachmi and Ouakhmi,Xie, Hsieh and Sye) 
and sound change over time (e.g., Leigh is now pronounced the same as Lee). 

Additionally, regular (i.e., predictable) processes of speech produce variability in how a 
name may be written. Thus, the presence of the letter/? in Thompson is an artifact of poor 
articulatory timing as the articulators move from a nasal fm] to an oral [s]. (The variant 
spelling Thomson reflects a more etymologically justified spelling.) 

3.0 Name Representation: Spelling 

LAS has been investigating the feasibility and utility of incorporating information about 
the pronunciation of characters into the automated name searching process. The 
researchers considered a number of options, including an acoustic-level of representation 
and character-based rules, and determined that searching of character- based databases 
could be enhanced to include predictable language-based information about charactcr-to- 

1 Square brackets indicate that a sound is being represented, rather than a spelling. 



«5 Language Analysis Systems. Inc.. 1 997 



2 



03/06 '98 12:42 



I D:LANG. ANALYSIS SYSTEMS 



FAX : 703-834-6230 



PAGE 5 



sound mappings. Specifically, LAS recommended the use of the stock of phonetic 
symbols known as the International Phonetic Alphabet (IPA), widely used by linguists to 
represent the inventory of sounds used in the world's languages, and officially adopted by 
the International Phonetic Association (Laver, 1994). The IPA uses a closed set of 
symbols to transcribe speech in ways that are interpretable unambiguously by linguists, 
regardless of the language being described. (See Appendix A.) For example, the symbol 
[i] (placed between brackets to indicate that it represents a sound rather than a letter) 
always stands for a voiceless labiodental fricative, as in English thigh, while [] always 
stands for the equivalent voiced labiodental fricative, as in English thy. Thus, IPA. 
disambiguates the English orthographic pattern of using th to stand for either sound: thigh 
[laj] versus thy [aj]. A name such as Gaithei\ of course, might be pronounced with either 
of these sounds, and would thus have two IPA representations, one for each pronunciation: 
[geir] versus [ger]. There is international agreement by members of the International 
Phonetic Association, founded in 1 889, as to the interpretation of IPA symbols. A re- 
evaluation of the stock of symbols and special diacritic marks took place at the 1989 IPA 
Convention in Kiel, and the efforts of the Association have resulted in the unambiguous 
mapping of sounds onto IPA symbols that transcends individual speakers or languages 
(Laver, ibid.). 

4.0 Mapping Spelling to Sound 

The issue of how to predict pronunciation of names from orthography is far from trivial. 
Two key considerations include that: 

• pronunciations of proper names are far less uniform than pronunciations of other 
vocabulary. The pronunciation of the noun dough is more-or-less fixed in English, 
despite the fossilized spelling that can be traced to an earlier pronunciation. The 
pronunciation of the name Lough is far less certain: individuals named Lough may 
well vary in their pronunciation of the family name and, even if all families named 
Lough could reach a consensus, there is no assurance that those unfamiliar with their 
consensus would guess that pronunciation. Additionally, some names retain old 
spellings that map to modern pronunciations in highly improbable ways (e.g., British 
Cholmondeley is commonly pronounced the same as Chumley). Claims of "correct" 
pronunciations carry little weight in terms of name searching; 

and: 

• orthographies are language-specific. The pronunciation of the letter x regularly maps 
to [ks] and [z] in English (Alexander, Xenia\ is regularly silent word-finally in French 
orthography (LaCroix), stands for the velar fricative [x], or [s] in Spanish (Mexico, 
Xochimilco), and a [dz] or [] in Albanian (Hoxha). Additionally, standardized 
transcription systems from non-roman systems to roman exploit the letter x to stand for 
other, non-English sounds (e.g., Chinese Xie, Greek Xristos). Finally, any name may 
be nativized to fit the "borrower" language: spellings of non-Anglo names may be 
pronounced according to English orthographic conventions (e.g., French Duquesne 
pronounced [dukwzni].) 



O Language A nu lysis Systems. Inc., 1997 



3 



03/06 '98 12:43 



I D : LANG . ANALYSIS SYSTEMS 



FAX: 703-834-6230 



PAGE 6 



5.0 Writing IPA Conversion Rules 

IP A is an effective notational system for representing pronunciation. LAS has written sets 
of rules that relate spellings to sounds. The rules are language-based, with sets of rules 
operating for Arabic, Mandarin Chinese, Hispanic and Anglo names. The rules assume: 

• 26-character sets of roman letters, absent all diacritic markings, including accent marks 
or tone indicators; 

• English speakers, either naive or expert in the language of origin; 

• one spelling can map to multiple pronunciations. 

The rule sets were written to specific development databases made of single name 
elements, either surname or given, and taken from a variety of sources, including the U.S. 
Census list of the most frequent names in the U.S. and large U.S. databases of names from 
other countries. The names were manually tagged as "Arabic", "Mandarin Chinese' 1 , 
"Hispanic" and "Anglo", where "Anglo" was loosely interpreted to include Western 
European Germanic names (including Dutch and German). A team of linguists used a 
variety of sources to determine possible pronunciations, including native speaker 
knowledge and textual information (e.g., Cummings, 1988, Hanks and Hodges, 1989, 
1990, Symonds, 1986). In general, rules were written broadly in order to ensure that most 
plausible pronunciations were captured. The Arabic and Mandarin Chinese rules included 
transcription variation (e.g., Chinese pinyin, Wade-Giles and Yale conventions of 
rendering Chinese names into roman script, as in Xie/Hsieh/Sye). The sample Anglo rule 
below is interpreted to mean that the letters sc preceded by anything and followed by the 
letters le can be pronounced as [s] or [sk] (e.g.. Muscle and Mosclin): 

scl anything le -> [sk?] 

Rules were implemented using standard regular expression notation. The following table 
shows a sample query and the names returned from a data file containing the 88,799 most 
frequent surnames from the U.S. census: 



Search on SMITH 

SMITH 
SMYTH 
SMITHE 

SMIT 
SMYTHE 
SM1DT 
SM1HT 
SZM1DT 



Figure 1 Search on name SMITH 



'O language Analysis Systems, Inc.. 1997 



4 



03/06 '98 12:43 I D : LANG . ANALYSIS SYSTEMS . FAX : 703-834-6230 PAGE 7 



As an example of the advantages of matching on I PA, consider a query on the name Lee. 
Converted to the 1PA string [li], exact matches with numerous spelling variants are 
automatic, including Leigh and Li. Typical character-based matches will fail to retrieve 
Leigh or LL since the percentage of character overlap is minimal. Conversely, a standard 
index matching system such as Soundex will categorize Lee and Li identically, but will still 
miss Leigh, given the presence of a salient letter (#h and will retrieve a large number of 
names of low relevance, including Lu, Liao. Low, Louie. Lahoya and Lehew. 

6.0 Phonological Processes 

In addition to predictable spelling variation, rules were written to account for predictable 
articulatory processes (MacKay, 1987; Wolfram and Johnson, 1982). For example, the 
variant spellings of Thomson ~ Thompson, Simson ~~ Simpson, Demsey ~- Dempsey, etc. 
can be accounted for by regular movement of the velum (i.e., the soft palate) from a 
bilabial nasal [m] to aaoral [s]. Production of an intrusive bilabial oral [p] is entirely a 
result of the timing of the movement from nasal to oral articulation. LAS incorporated 
likely articulatory variation into the IPA rule sets. Thus, a query of the name Thomson will 
retrieve the variant Thompson as an exact match. 

7.0 Testing the Rule Sets 

To test the net effect of the Orthography-to-lPA rules, LAS conducted a controlled lest of 
the rules by randomly selecting 1 57 test names from a database of 55,545. The database 
contained names that were from sources identified as Arabic, Mandarin Chinese, Hispanic 
and Anglo (again, broadly defined). A native speaker of educated standard American 
English was asked to record the 157 test names using pronunciations of his choosing. The 
audio recordings were played for native speakers of American English, who were asked to 
write one or more 'likely" spelling for each name. LAS elicited 3,689 variants in all by 
playing the recordings to native speakers of American English. The variant spellings were 
then used as test query names to calculate the retrieval rates of the original name spellings. 
Overall, 69% of all variant spellings were retrieved by the IPA rules. However, qualitative 
analysis of the results showed that approximately 23% of the variant names not retrieved 
were due to perceptual mishearings of the recorded names. For example, the variant 
spellings of the test name Baughn predictably included Bahn, Baun, and Bonn, and the IPA 
Conversion Rules succeeded in mapping all to the original test name spelling. However, a 
fourth elicited spelling, Vaughn, was not predicted, and the IPA Conversion Rules did not 
map it to Baughn. The mishearing of [v] for |b] is not unusual, given the acoustics shared 
by the two sounds. The IPA Conversion Rules, which include regular articulatory variants 
such as Thomson/Thompson, were purposely not intended to retrieve perceptually similar 
names during the current phase of research. 



O Language Analysis Systems, Inc.. 1997 



5 



03/06 ' 98 12:44 



I D : LANG . ANALYS I S SYSTEMS 



FAX: 703-834-6230 



PAGE 8: 



8.0 Fuzzy Matches: Articulatory Similarity 

Al the heart of the research has been an effort to improve the automatic name searching 
process by retrieving names that are similar to the query name. The I PA Conversion Rules 
are able to capture a good deal of name variation that can be attributed to orthographic 
sources, whether intralingual (e.g., Leigh/Lee) or interlingual (e.g., transcriptions to roman 
orthography from Chinese: Xie ~ Hsieh - Sye). An additional goal has been to retrieve 
names that are not phonologically identical to the query name, but that a careful analyst 
would like to consider before abandoning a search. Thus, while spelling variants of the 
name Benke include Behnke and Benck, the analyst might want to consider names that 
seem phonologically close to the query name without being a predictable variant (e.g., 
Benge, Bankey and perhaps even names like Penke. Panke or Bentsche). While most 
search algorithms permit fuzzy matches, these are invariably based on calculations of 
number of characters shared. From the perspective of character matching, the letter h is as 
different from the p as it is from x, y or z. Thus, to permit retrieval of Penke for Benke is to 
require retrieval of any name that differs from the query by the first character, including 
Xenke, Yenke and Zenke. This clearly does not follow any phonologically reliable 
principle, and significantly reduces the efficiency of automatic retrieval. Even indexed 
systems, such as Soundex, group letters as either co-indexed or unrelated. Thus, while 
Soundex is often called "phonetic" because it groups letters that share some phonological 
characteristics, it cannot compare the degree to which two sounds, or indeed two names are 
related: it lacks granularity. Thus, Soundex would treat Benke, Penke and Panke as 
identical rather than similar. Soundex would exclude Bentsche from the group because of 
the letter t in the spelling, in effect treating Bentsche as being equally distant from Benke as 
from Smith. 

It is clear, however, that sound segments can be analyzed in terms of their articulatory 
characteristics, and that some sounds fall into natural categories, such as vowels and 
consonants. Properties of sounds have been described in detail by a number of linguistic 
analyses according to place and manner of articulation (e.g., [p] and [b] are both articulated 
at the lips by complete blockage of the air flow and sudden release of pressure). One of the 
best known descriptions of phonetic classification is that of the American linguists 
Chomsky and Halle (1968). All the distinct sounds of American English can be described 
using 15 distinctive features (see Appendices U and C). By classifying sounds according 
to these distinctive features, a fairly clear picture emerges of how close any two sounds are 
to one another. Thus, [p] and [b] differ by just one feature, voicing, while [p] and [f] differ 
by three and [p] and [v] by four. In general, articulatory distance can be counted in terms 
of how many articulatory characteristics sounds share. 

LAS created a file of feature differences between pairs of sounds, essentially mapping 
phonetic features onto 1PA notation. By relaxing the threshold of allowable differences, 
increasingly distant sounds are retrieved. Thus, by permitting matches of IPA characters 
that are not exact matches, names are retrieved that are phonologically close. Even IPA 
sound-to-sound comparisons yield interesting sets of names for comparison. By relaxing 



O Language Analysis Systems. Inc . 1997 



6 



03/06 '98 12:44 



I D : LANG . ANALYSIS SYSTEMS 



FAX: 703-834-6230 



PAGE . 9 



retrievals to include single feature differences, a search of the name Smith now brings back 
these additional names: 



Search on SMITH 
Feature Difference Threshold: 1 

SMID 
SMEAD 
SN1TH 
SNIPE 
SNIDE . 
SNEED 
SNEAD 
SNAPE 
S NEATH 



Figure 2 Fuzzy Search on Smith measuring Phonetic Feature Differences 

Viewed in physiological terms, this is reasonable. Phonetic features refer to salient 
characteristics of articulation, so that differences generally reflect how likely it is that any 
two sounds would be articulated in place of another. There are numerous additional 
factors, of course, that ought to be considered in measuring how similar two names are to 
one another articulatorily. 

9.0 Final Sorting of Names Retrieved 

The names retrieved by searches on phonetic features may not all be of equal relevance to 
the query name. Additional factors arc under consideration to sort names retrieved, based 
on a variety of phonological characteristics. 

9.1 Sonority Level 

The differences in phonetic features generally express the amount of effort needed to move 
articulators from one sound to another. The sounds [p], [t] and [k] form a natural class of 
voiceless stop consonants — identical in manner of articulation. All are extremely 
common in the world's languages, and are among the first acquired by children. They 
differ in place of articulation, and this is reflected in feature differences. However, manner 
of articulation is probably a better measure of energy expenditure than is place of 
articulation: voiceless stops are all extremely low in sonority, that is, the amount of energy 
needed to produce a sound. Vowels, on the other hand, require much more effort: they, in 
essence carry the sound wave. In order for feature differences to effectively measure level 
of effort required, differences should be weighted according to sonority level. In general 
terms, sounds fall into nine levels of sonority, with voiceless stops [p], [t] and [kl at the 



<D Ixtnguagc Analysis Systems. Inc.. 1997 



7 



03/06 '98" 12:45 



ID: LANG. -ANALYSIS SYSTEMS 



FAX : 703-834-6230 



PAGE 10 



low end and the vowels [ ] as in father, and [©] as in fan at the most sonorous end 
(Ladefoged, 1982). Sorts of names retrieved ought to consider the sonority value of 
sounds. This might be accomplished by weighting phonetic features or by a more 
complicated comparison of sonority level contours of names or syllables. 

9.2 Syllabification 

Additionally, in languages that time segments based in part on stress patterns, it is 
reasonable to compare stressed syllables to one another. In the following example, names 
have been aligned in terms of substrings, in this case corresponding to syllables; 

Chester: [ t* axp] 

Chesterton; [t* atp xv] 

Winchester: [co v x* axp ] 

Both in terms of articulatory effort (sonority) and psychological salience, it would be 
misleading to treat all three occurrences of the substring [t*] as equivalent: stress clearly 
must be included in the equation. LAS has written a syllabifier that automatically parses 
English IPA strings, including names, according to a set of rules. Future research will 
investigate the possibility of ranking similar names through analysis at the syllabic level. 
Syllabic level analysis has the strength of lining up comparable substructures of names. 
All syllables share the same internal structures (i.e., onset of the syllable, nucleus, and 
coda), and alignment by syllable enables meaningful comparisons of internal structures of 
names (where a period represents the syllable break): 

Linda (k v . 5 ] 
Lisa [X i _ . a ] 

Note that in the above example, the coda (i.e., end) of the first syllable in Linda is filled by 
[n| but empty in Lisa, as indicated by the underscore. A meaningful comparison of the two 
names would compare the [n] of Linda to an empty coda rather than to the [s] in the onset 
(i.e., beginning) of the second syllable of Lisa. 

9.3 Position in Name 

Some weight ought to be given to absolute initial position in names. Many indexed 
systems, including Soundex, key names to the initial letter. This is, of course, problematic, 
since the initial letter may be silent or part of a digraph (e.g., Knox, Philip). However, 
indexing on the first sound, or at least considering the first sound as more significant than 
sounds in other positions may be warranted. This, like syllable-level comparisons, will 
probably be a factor in final sorting of names retrieved. 

9.4 Non-Phonological Factors in Sorting of Names Retrieved 

Certainly, it must be acknowledged that non-phonological levels of analysis may be critical 
to any useful definition of similarity. Morphological units - word parts that may contain 



O Language Analysis Systems, Inc.. 1997 



8 



03/06 '98 12:45 



ID: LANG. ANALYSIS SYSTEMS 



FAX : 703-834-6230 



PAGE 11 



semantic information, including prefixes and suffixes — such as Mc-, -Ion, and -sky are 
likely sources of variations. Thus, Lubin and Lubinsky are critically related (in terms of 
their roots), while Lubin. Rubin and Lupine are very close in terms of articulation. The 
morphological factor could be handled efficiently with a look-up list of morphological 
elements, but this remains outside the current scope of this project. 

Similarly, orthography itself might play a useful role in the final sort of names retrieved. 
The following names retrieved for a fuzzy search on the name Bucket have been sorted 
using a simple sort on letters. 



Search on BUCKET 
Feature Difference Threshold: 1 

BYXKETT 

BEXKJET 

BIXKET 

B YQ YET 

BEXKETT 

BIXKETT 
BYXHHEJT 

BOGYET 

BA0YET 

BOX HA T 

BYXHITE 
BEXKQ1TH 

BAXOT 
BOOKOYT 

BAXOTE 
BOY0YET 

BEKHIT 
BOQXYTT 
BE&YETTE 



Figure 3 Search on Bucket Sorted by Spelling 

Current plans are for a final ranking of names retrieved based on a combination of factors, 
including number of syllables, stress, weighting of features by sonority levels and name- 
initial segments. 

10.0 Conclusions 

In sum, automatic name searching can benefit in three ways from incorporation of 
phonological information: 



© Language Analysis Systems. Inc.. 1997 



03/06 '98 12:46 



I D : LANG • ANALYSIS SYSTEMS 



FAX: 703-834-6230 



PAGE 



• leveling differences due exclusively to orthographic mapping; 

• leveling differences due to predictable phonological processes, such as intrusive 
consonants; and 

• retrieving additional names that contain phonologically similar sounds to those of the 
query name. 

Having retrieved phonologically relevant names, a phonologically-enhanced name search 
engine can then sort names using a multiple factor weighting scheme. 

LAS views this technology as extremely promising, offering a tool to enhance current 
automatic name searching, increasing chances of retrieving name variants that character- 
based systems miss by retrieving and sorting names in a phonologically principled way. 



CO Language Analysis Systems. Inc., \ 997 



10 



03x06 '98 12=46 ID=LANG. ANALYSIS SYSTEMS FAX : 703-834-6230 PAGE 13 



Appendix A: Descriptions of IP A Symbols 



Phonetic 


Description 


Example 


symbol 






P 


voiceless bilabial stop 


p in the English name Peter 


b 


voiced bilabial stop 


b in the English name Buddy 




voiceless bilabial fricative 


fin the Japanese name Fujimori 




voiced bilabial fricative 


b in the Spanish word saber 


m 


bilabial nasal 


m in the English name Mary 




voiced rounded palatal approximant 


u in the French name Nuit 


f 


voiceJess labio-dental fricative 


f in the English name Fred 


V 


voiced labio-dental fricative 


v in the English name Vera 




voiced labio-dental nasal 


n in the Italian word anfora 


t 


voiceless alveolar stop 


t in the English name Ted 


d 


voiced alveolar stop 


d in the English name Doug 


0 


voiceless apico-dental fricative 


th in the English name Theodore 


X 


voiced apico-dental fricative 


th in the English name Rather 


s 


voiceSess alveolar fricative 


s in the English name Sam 


z 


voiced alveolar fricative 


z in the English name Zachary 


n 


voiced alveolar nasal 


n in the English name Nathan 


1 


voiced alveolar lateral 


1 in the English name Linda 


(D 


voiceless alveolar lateral fricative 


11 in the Welsh name Llewellyn 


© 


voiced alveolar lateral fricative 


dhl in the Zulu word dhla (to eat) 


□ 


voiced alveolar continuant 


r in the English name Richard 


r 


voiced apico-alveolar trill 


r in the Spanish name Ricardo 


O 


voiced alveolar flap 


tt in the English name Ritter 


m 


voiceless retroflex stop 


as in the Arabic name Tariq 




voiced retroflex stop 


as in the Arabic word difda 1 (frog) 


■ 


voiceless retroflex fricative 


as in the Arabic name Sabir 


(1) 


voiced retroflex fricative 


as in the Arabic name Dhaflr 


tjT 


voiLcu retroiiex nasai 


Marathi (India) 


u 


voiced retroflex lateral approximant 


Marathi (India) 


ft 


vuiLcu iciruiicx nap 


d as in Hindi dal (lentil stew) 


• 


vuitticao jJalalu dl VUCl«.r IllCilllVC 


sn m the English name Sheila 




vviivwu yaiai\j~Ckl VL-vJlal IllCallVC 


z in the English word azure 


PI 


vuiucicaa aivcij-pdiaiai rncaiive 


x as in the Chinese name Xia 


s 


voiced alveo-palatal fricative 


1 in thf* Prtlicll \i; r\ rrl VI** 
1 III Hit- I UllblJ WUIU lie 




voiceless palato-alvoelar affricate 


ch in the English name Charlie 


dO 


voiced palato-alveolar affricate 


j in the English name Jennifer 




voiced palatal nasal 


■ CD in the Spanish word DoBOa 


<3> 


voiced palatal lateral approximant 


II in the Spanish word calle (street) 


k 


voiceless velar stop 


k in the English name Kim 


g 


voiced velar stop 


g in the English name Gary 


X 


voiceless velar fricative 


x in the Spanish name Jose 




voiced velar fricative 


g in the Spanish word luego (later) 


6 


voiced velar nasal 


ng in the English name Bing 



03/06 '98 12=46 



ID : LANG . • ANALYS I S SYSTEMS 



FAX : 703-834-6230 



PAGE 14 



Appendix A: Descriptions of IPA Symbols (Continued) 



Phonetic 

cvm hol 

J J III Kr KI 1 


Description 


Example ° 


/P 

V 


vuicciCbb velar lateral 


1 in the Polish Walesa 




vuilcicdj lauiLJ" vc ittr approx i nidni 


wh as in the English name White (for 
some speakers) 


w 


voippH h l^hial nt^r\rr» v itii atif 


w in the English name Wayne 


q 


voiceless uvular stop 


as in the Arabic name Qasim 


Q 


vvjiL-cu uvuiar siop 


Eskimo and Tehrani Persian 




vuiL-ciebb uvuiar rncative 


ch as in the German word Buch 




voiced uvular fricative 


r in some Parisian pronunciations of the 
French name RcnJe 


N 


; j _ 

voiced uvular nasal 


n in the Eskimo word eNima (melody) 


R 


vnipprf iiviilar trill 

vuit-cti uvuiar inn 


r in the French name RenJe 




voiceless pharyngeal fricative 


h as in the Arabic name Muhammad 


rv\ 


voiced pharyngeal fricative 


as in the Arabic name Sa'ad 




vuiwClCsb gluual Stop 


tt as in the English name Sutton or the 
word mitten 


h 


voiceless plottal fricative 


n. in the English name Henry 


D 


voiced glottal fricative 


h as in English between voiced sounds, 
as in the word manhood 


y 


high front rounded vowel 


u in me rrencn word lunc (moon) 


• 


high central unrounded vowel 


as in the Russian word s#n (son) 




Hiuh central rounded vnw^l 


u as in the Norwegian hus 


O 


hich bnck unrounrlpH vr^w^J 

* ■ ■ 1 ■ w r\ u 1 1 1 \j LI 1 1 LJ L^U V U W Vl 


u as in the Japanese name Kazu 


u 


nifin back rounded vowpI 


ou as in the French word tout 




UDDer mid-front ron nnVH 


6 as in the German name Sch nfeld 




r r uiiu UaLK UIIIUUMUCU VOWcj 


as in the Shan (Burma)word 'ko (salt) 


0 


upper mid-back rounded 


o as in the English name Mona 


{ 


?Pfni-n1ftn TTAnf lint* AMrlf4a/J 

acini ni^ji iron t unrounuea vowel 


y as in the English name Lynn 


E 


IfilVPr 1Y\ trl-frrtnt imrruxnAaA 

i w w iniu-iiuiii uiiiuunueci 


c as in the English name Deborah 




lowef-mio* front rmmrlorf lfAtif^l 


oeu as in the French word oeuf (egg) 


n 


lOWPr-ITl iH hflpW linrrumHtt/H w/mhaI 
ivs*vi.i illiu UaLN. UlllUtlJluCU VOWCI 


u as in the English name Tuppcrman 


© 


lower-mid hark imrniinH^I 


o as in the English name Ford 


F 


oncn front unrnnnHpH i/auia] 


a as in the English name Hal 


O 


open central unrounded vowel 


a as in the Portuguese worrl riai-a ffnr} 
■ ui iu^uvjw vy ui u (Jala \ I kJI ) 


£3 


low front unrounded vowel 


a as in the French word patte (paw) 


a 


low centra! unrounded vowel 


a as in the French name Delatre or the 
word p>te (paste or dough) 


® 


low back rounded vowel 


o as in the British English word hot 


★ 


mid central unrounded vowel 


e & a as in the English name Belinda 


U 


semi-high back rounded vowel 


u as in the English name Butch 


e 


upper-mid front unrounded 


a as in the English name [viable 


i 


high front unrounded vowel 


first e in the English name Pete 




rhotacized mid- vowel 


ea as in the English name Heather 



03/06 '98 12:47 



I D : LANG . ANALYSIS SYSTEMS 



FAX =703-834-6230 



PAGE 15 



Appendix A: Descriptions of IPA Symbols (Continued) 



Phonetic 
symbol 


Description 


Examnle 




t0 


voiceless alveo-patatal affricate 


j as in the Chinese name Jin 




tET 


voiceless aspirated alveo-palata! affricate 


q as in the Chinese name Qiu 




ts 


voiceless unaspirated dental affricate 


ts as in the Chinese name Tsang 




ts' 


voiceless aspirated dental affricate 


c as in the Chinese name Cao 




® 


bilabial click 


as in Southern Bushman languages 




1 . 


dental (alveolar) click 


as in Bushman 




! 


palatal click 


as in Bushman 






palato-aiveolar click 


as in Hottentot 




® 


alveolar lateral click 


as in Bushman, Zulu 





03/06 '98 12:47 I D : LANG . ANALYSIS SYSTEMS FAX : 703-834-6230 



PAGE 



16 



Appendix B: Description of Phonetic Features 



A. Major class features: 
A Syllabic 

Forms the central peak of a syllable. Vowels are usually +syllabic, consonants are 
usually -syllabic, but some (like [ I ]) may be syllabic (as in "riddle") 

2. Sonorant 

Minimal constriction in the mouth. Vowels, as well as [ n ], [ m ], [ r ], [ I ], [ w ] are all 
+sonorant. Most other consonants are -sonorant. 

3. Consonantal 

Obstruction along a central point in the mouth. All English sounds except vowels and 
glides ([ w ] and [ y ]) are m consonantal. 

B. Manner of Articulation Features: 

4. Continuant 

Continued air movement through the mouth during sound production. This feature 
contrasts fricative sounds like [ f 1 and [ v ] with non-continuants like [ p ] and [ b ]. 

5. Strident 

Narrow obstruction through which air escapes, producing hissing or "white noise". [ s J, 
[ z ]» [ f M v 1 and ^e sounds in church and judge are +strident. This is the most 
acoustically-based feature in this list. 

6. Delayed Release 

Gradual release of air. In English, it is used to distinguish the sounds in church and 
judge from [ t ] and [ d ] 

7. Nasal 

Soft palate at the back of the mouth is lowered and air goes into nose. In English, ) n ], 
[ m ] and [ % ] (the final sound in king) arc -Hiasal. 

8. Lateral 

Side(s) of tongue lowered so that air escapes along side, as in English [ 1 ]. 
C. Place of articulation: 

9. Anterior 

Obstruction of mouth anywhere from gum ridge forward to lips. English [ p ], [ b ]» 
[ ™ ] A f ], [ v ], and [ ] (as in the) are all t-anterior. 

/ 0. Coronal 

Front of the tongue raised. The sounds [ t J and f d ] are +coronal. Sounds like [ k ] and 
I g ] are -coronal. 

/A High 

Body of tongue raised, [ j 1 (as in yellow), and the vowel I H ] (as in feet) are +high. 



03/06 '98 12:48 



ID: LANG. ANALYSIS SYSTEMS 



FAX: 703-834-6230 



PAGE 17 



Appendix B: Description of Phonetic Features (Continued) 

12. Low 

Body of tongue lowered. The vowels f * ] as in back and [ 8 ] as in father are flow. 
/J. Back 

Body of tongue moved back. The sounds f k ] and [ g ] and the vowel [ u ] as in boot are 
+back. 

14. Tense 

Root of tongue muscle tensed. The vowel ( X ] (as in feet) is +tcnse. The vowel [A\ ] as 
in fit is -tense. 

15. Round 

Lips pursed or rounded. English vowel [ u ] (as in boot) is Ground, while [ ft ] (as in 
beet) is -round. 



03/06-98 12 = 48 



ID:LAN6. ANALYSIS- SYSTEMS 



FAX: 703-834-6230 



PAGE 18 



Appendix C: Phonetic Features for [ p ], [ b j and | f ] 



Phonetic Features 



IP 



I f| 



syllabic 



sonorant 



consonantal 



+ 



+ 



+ 



anterior 



+ 



+ 



4- 



coronal 



high 



low 



back 



strident, 



delayed release 
voiced 



nasal 



lateral 



round 



03^06 '98 12:48 



I D : LANG . ANALYSIS SYSTEMS 



FAX: 703-834-6230 



PAGE 19 



References 

Chomsky, Noam and Halle, Morris. The Sound Pattern of English. Harper & Row, New York, 1 968. 

Cummings, D. W. American English Spelling, The Johns Hopkins University Press, London, 1988. 

Hanks, Patrick, and Hodges, Flavia. A Dictionary of First Names, Oxford University Press, Oxford, 1990. 

Hanks, Patrick, and Hodges, Flavia. A Dictionary of Surnames, Oxford University Press, Oxford, 1989-90. 

Hermansen, John C. Automatic Name Searching in Large Data Bases of International Names, unpublished 
dissertation, Georgetown University, Washington, D.C., 1985. 

Ladefoged, Peter. A Course in Phonetics, Harcourt Brace Jovanovich, Publishers, San Diego, 1982. 
Laver, John. Principles of Phonetics, Cambridge University Press, 1994, . 

MacKay, Ian. Phonetics: The Science of Speech Production, Little, Brown, and Company, Boston, 1987. 

Symonds, Martin A. Mandarin Pronunciation, Taipei Language Institute, Taipei, 1986. 

Wolfram, Walt and Johnson, Robert. Phonological Analysis; Focus on American English, The Center for 
Applied Linguistics & Harcourt Brace Jovanovich, Inc., 1982. 



